System and a method for executing sql basic operators on compressed data without decompression process

ABSTRACT

The present invention discloses a method for executing an SQL operator on compressed data chunk. The method comprising the step of: receiving SQL operator, accessing compressed data chunk blocks, receive e full set of derivatives of the compression scheme, check compression rules based on the compression scheme and relevant operator for approving SQL operation on compressed data and in case of approval applying respective SQL operator on relevant compressed data chunks.

BACKGROUND Technical Field

The present invention relates generally to method for Executing SQLBasic Operators on Compressed Data without Decompression Process.

BRIEF SUMMARY

The present invention discloses a method for executing an SQL operatoron compressed data chunk. The method comprising the step of: receivingSQL operator, accessing compressed data chunk blocks, receive e full setof derivatives of the compression scheme, check compression rules basedon the compression scheme and relevant operator for approving SQLoperation on compressed data and in case of approval applying respectiveSQL operator on relevant compressed data chunks.

The present invention discloses a method for executing an SQL operatoron compressed data chunk using at least one HWA. The method comprisingthe step of: receiving SQL operator, accessing compressed data chunkblocks, receive full set of derivatives of the compression scheme, checkcompression rules based on the compression scheme and relevant operatorfor approving SQL operation on compressed data, wherein the compressionscheme is at least one of FOR scheme or BWT scheme, and in case ofapproval applying respective SQL operator utilizing multiple threads ofHWA unit on relevant compressed data chunks.

According to some embodiments of the present invention the SQL operatoris approved incase each uncompressed data unit is an algebraictransformation of according to single offset value.

According to some embodiments of the present invention the operator issort operation and the sort operation is applied directly on thecompressed data.

According to some embodiments of the present invention the operator ismerge operation, wherein before merge operation is applied, is preformedoffset alignment of all compressed chunks utilizing multiple threads ofat least one HWA unit based on vector/super-scalar architecture,applying the merge on the aligned compressed chunk.

According to some embodiments of the present invention the operator isjoin operation, wherein before join operation is applied, is performedan algebraic transformation by recalculating offset values of therelevant data chunks utilizing multiple threads of at least one HWA unitbased on vector/super-scalar architecture, applying the join operationon the transformed data of the relevant data chunk.

According to some embodiments of the present invention the operator isreduce operation, wherein the reduce operator is approved if reduceoperator is obeying the commutative property, applying the reduceoperation on the transformed data of the relevant data chunk.

According to some embodiments of the present invention the operator ishash function, wherein the reduce operator is approved if hash functionis injective not only for original values but also for result values,applying hash function on compressed data for mapping compressed dataunit through using hash function.

According to some embodiments of the present invention the SQL operatorsare applied at least partly while the data is uncompressed, wherein thepart of the data chunks are uncompressed sequentially one after theother, wherein the operators are applied on the already decompressedparts.

According to some embodiments of the present invention the compressionscheme is BWT, wherein the decompression process apply multiple threadsfor analyzing multiple index rows of the BWT result string, enabling toprovide partial decompression results through the compression process.

The present invention discloses a system for executing an SQL operatoron compressed data chunk using at least one HWA. The system comprisedof: a database of clustered compressed data chunks including compressionscheme, at least one HWA unit, at least one CPU unit, SQL operatorsmodule for receiving SQL operator, accessing compressed data chunkblocks, receiving full set of derivatives of the compression scheme,check compression rules based on the compression scheme and relevantoperator for approving SQL operation on compressed data, wherein thecompression scheme is at least one of FOR scheme or BWT scheme; and incase of approval applying respective SQL operator utilizing multiplethreads of HWA unit on relevant compressed data chunks.

These, additional, and/or other aspects and/or advantages of the presentinvention are: set forth in the detailed description which follows;possibly inferable from the detailed description; and/or learnable bypractice of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be more readily understood from the detaileddescription of embodiments thereof made in conjunction with theaccompanying drawings of which:

FIG. 1 illustrates a block diagram showing the entities and modulesinvolved in processing user SQL operators on compressed data, accordingto some embodiments of the invention.

FIG. 2 is a flow diagram of SQL operators module processing, accordingto some embodiments of the invention;

FIG. 3 is a flow diagram the joint operator processing, according tosome embodiments of the invention;

FIG. 4 is a flow diagram of the hash operator processing, according tosome embodiments of the invention;

FIG. 5 is a flow diagram of the sort operator processing, according tosome embodiments of the invention;

FIG. 6 is a flow diagram of the merge operator processing, according tosome embodiments of the invention;

FIG. 7 is a flow diagram of the reduce operator processing, according tosome embodiments of the invention;

FIG. 8 is a flow diagram of the inverse BWT decompression schemeprocessing, according to some embodiments of the invention; and

FIG. 9 is an example of created matrix to be used for inverse BWTdecompression scheme processing, according to some embodiments of theinvention;

DETAILED DESCRIPTION

Before explaining at least one embodiment of the invention in detail, itis to be understood that the invention is not limited in its applicationto the details of construction and the arrangement of the components setforth in the following description or illustrated in the drawings. Theinvention is applicable to other embodiments or of being practiced orcarried out in various ways. Also, it is to be understood that thephraseology and terminology employed herein is for the purpose ofdescription and should not be regarded as limiting.

The term “HWA (HardWare Accelerator):” as used herein in thisapplication, is defined as any hardware that connected to main CentralProcessing Unit (CPU) through a Peripheral Component Interconnect (PCI)bus and encompasses a multiple computational cores inside. Example:GPGPUs (with 1000s of cores), Intel MICs (with 10s of cores).

In normal compression scheme the compression of data unit of size S0 istransformed into another data unit of constant and predefined size S1(S0>S1) according to predefined full set of derivatives of thiscompression scheme.

The term “Derivatives” as used herein in this application, is defined asplurality of algebraic operations applied on data units fordecompressing data container consisting of plurality compressed dataunits. Example: FOR (Frame Of Reference) Compression, derivatives:

pFOR (Patched FOR), pFORd (patched FOR on Deltas)

Frame of Reference (FOR): FOR determines the range of possible values ina block, called a frame, and maps each value into this range by storingjust enough bits to distinguish between the values.

The term “SQL: Structured Query Language” as used herein in thisapplication, is defined as SQL is a variant of DSL. It is alsotransformed to set of Map-Reduce operators to be executed by MRF,exactly as DSL. Example: Apache Hive SQL dialect called HQL.

Patched Frame Of Reference (PFOR)

PFOR is an extension of FOR that is less vulnerable to outliers in thevalue distribution.

PFOR stores outliers as exceptions such that the frame of reference [0;max] is greatly reduce.

Delta encoding technique in compression scheme stores the differencebetween the previous integer and the current one in the uncompressedstring, instead of storing the original string integer. This allowsencoding an ordered list of integers using smaller number of characters,which can be encoded in fewer bits.

The term “BWT (Burrows-Wheeler transform)” as used herein in thisapplication, is defined as Compression technique which identifiesrepeated patterns in the data and encoding the duplications morecompactly by rearranging the data character string into sequences havingsimilar characters.

FIG. 1 illustrates a block diagram showing the entities and modulesinvolved in processing user SQL operator on compressed data chunks,according to some embodiments of the invention. A user 10 initiates anSQL query (11), which is sent to the SQL operator module (18). The SQLoperator run at least one operator of the query on the compressed Datachunk blocks stored on the DBMS using HWA (20) or the CPU unit 22. Suchprocess eliminates the need to uncompressed at least part of the datachunks on runtime, hence, accelerating the processing of the operators.

FIG. 2 is a flow diagram SQL operator module processing, according tosome embodiments of the invention. The module access multiple compresseddata chunk blocks stored on a DBRM (202) and receives a full set ofderivatives of the compression scheme (204). The SQL operators to beapplied are identified by checking the user query (206). At the nextstep are examined compression rules which are relevant for each operator(208). The rules determine if it's possible to apply the determinedoperator without decompressing the data chunks. Examples of the rulesare detailed bellow for each type of operator. In case of approval therespective operator is applied directly on the compressed data chunks oron algebraic transformation, which is relevant for the specific operatorand compression scheme (210).

FIG. 3 is a flow diagram of a joint operator processing, according tosome embodiments of the invention.

Before performing the joint operation, the compression scheme isexamined to check each uncompressed data units is an algebraictransformation of compressed data according to single offset value(302). If yes algebraic transformation is applied on the compressed datachunks by recalculating offset values of the relevant data chunks (304).

E.g. of pFORd scheme: only deltas between integers of the string arecollected, requiring to change offsets to have a common value andrecalculate the deltas accordingly. This recalculating process is analgebraic transformation and could be performed by each execution unitin vector/super-scalar architecture.

At the next step is performed Join operation on the transformed data ofthe relevant compressed data chunks (306). Join may include findingintersection or non-intersected area. The advantage is of applying JONoperation on transformed data unit not requiring to decompress the datachunk in real time, is reducing memory consumption throughout theprocess.

FIG. 4 is a flow diagram of the hash operator processing, according tosome embodiments of the invention. In most cases hash operation oncompressed data is not possible.

In case the hash function is injective not only for original values butalso for result values the HASH function can be applied on theuncompressed data chunks by mapping compressed data unit throughunmodified hash function (404).

If hashing process is used for JON or Reducing as follow-up operations,an algebraic transformation on compressed data units can be optionallyperformed as mentioned above (FIG. 4) (an algebraic transformation ofcompressed data according to single offset value). In general case eachcompressed unit must be decompressed before hashing operation.

FIG. 5 is a flow diagram of the sort operator processing, according tosome embodiments of the invention;

Before performing the sort operation the compression scheme is examinedto check if each uncompressed data units is an algebraic transformationof compressed data unit according to single offset value (502).

At the next step the sort operation is applied on the compressed dataunits on the relevant data chunk units (504)

FIG. 6 is a flow diagram of the merge operator processing according tosome embodiments of the invention.

Before performing the merge operation the compression scheme is examinedto check if each uncompressed data unit is an algebraic transformationof compressed data unit according to a single offset value (602). Ifyes, algebraic transformation is applied on the compressed data chunksby recalculating offset values of the relevant data chunks (604). At thenext step the merge operation is applied on the transformed data of therelevant compressed data chunks (606).

FIG. 7 is a flow diagram of the reduce operator processing, according tosome embodiments of the invention;

In case the reduce operator is obeying the commutative property (704),check if according to the compression scheme each uncompressed dataunits is an algebraic transformation of a single offset value (708). Ifyes, perform reduce operation on directly on the compressed data (710).

FIG. 8 is a flow diagram of the inverse BWT decompression schemeprocessing, according to some embodiments of the invention.

Based on received result string OF BWT compression process, define twoindex vectors of BWT result string: first index according to order ofreceived result string from the BWT process and a second index accordingto alphabetic order (step 802). A third vector define an indicator (0,1)for each, if a row take place in the a shifting process and describedbelow. At the first step a single row is indicated (receiving the valueof 1) according to the index integer i that represents the position ofthe original input (received from the BWT compression process), theselected row is shifted to the top (step 804). At the end of thisdefinition process is created a matrix including the indication vector,first index column, the string result column, the second index andresults string in alphabetic order (see FIG. 9).

At each phase of the algorithm is preformed a simultaneous shifting ofrows, in cycled manner of one column of the matrix, shifting only ofrows which are indicated by the indicator vector, the shifting isperformed by multiple threads of HWA units, till at least one value ofthe first index is equal to one value in the second index in thepreceding row (step 806).

At the end of each phase, the algorithm checks, if all values of thefirst index values are equal to values of the second index in thepreceding rows (step 808). If, yes the algorithm has ended and the orderof string at the third/fifth column is original string beforecompression (step 812).

If no, updating indication vector, for all rows, where the value of thefirst index is equal to one value in the second index in the precedingrow, set the value to 1 (step 810) and switch the column to be shifted(step 814).

The SQL operators can be processed on indicated rows while thedecompression is under process, thus accelerating the query processing.

In the above description, an embodiment is an example or implementationof the invention. The various appearances of “one embodiment”, “anembodiment” or “some embodiments” do not necessarily all refer to thesame embodiments.

Although various features of the invention may be described in thecontext of a single embodiment, the features may also be providedseparately or in any suitable combination. Conversely, although theinvention may be described herein in the context of separate embodimentsfor clarity, the invention may also be implemented in a singleembodiment. Furthermore, it is to be understood that the invention canbe carried out or practiced in various ways and that the invention canbe implemented in embodiments other than the ones outlined in thedescription above.

The invention is not limited to those diagrams or to the correspondingdescriptions. For example, flow need not move through each illustratedbox or state, or in exactly the same order as illustrated and described.

Meanings of technical and scientific terms used herein are to becommonly understood as by one of ordinary skill in the art to which theinvention belongs, unless otherwise defined.

What is claimed is:
 1. A method for executing an SQL operator oncompressed data chunk using at least one HWA, said method comprising thestep of: receiving SQL operator; accessing compressed data chunk blocks;receive full set of derivatives of the compression scheme; checkcompression rules based on the compression scheme and relevant operatorfor approving SQL operation on compressed data, wherein the compressionscheme is at least one of: FOR scheme or BWT scheme; and in case ofapproval, applying respective SQL operator utilizing multiple threads ofHWA unit on relevant compressed data chunks.
 2. The method of claim 1wherein the SQL operator is approved incase each uncompressed data unitis an algebraic transformation of according to single offset value. 3.The method of claim 2 wherein the operator is sort operation and thesort operation is applied directly on the compressed data.
 4. The methodof claim 2 wherein the operator is merge operation, wherein before mergeoperation is applied, is preformed offset alignment of all compressedchunks utilizing multiple threads of at least one HWA unit based onvector/super-scalar architecture, applying the merge on the alignedcompressed chunk.
 5. The method of claim 2 wherein the operator is joinoperation, wherein before join operation is applied, is performed analgebraic transformation by recalculating offset values of the relevantdata chunks utilizing multiple threads of at least one HWA unit based onvector/super-scalar architecture, applying the join operation on thetransformed data of the relevant data chunk.
 6. The method of claim 2wherein the operator is reduce operation, wherein the reduce operator isapproved if reduce operator is obeying the commutative property,applying the reduce operation on the transformed data of the relevantdata chunk.
 7. The method of claim 1 wherein the operator is hashfunction, wherein the reduce operator is approved if hash function isinjective not only for original values but also for result values,applying hash function on compressed data for mapping compressed dataunit through using hash function.
 8. The method of claim 1 wherein theSQL operators are applied at least partly while the data isuncompressed, wherein the part of the data chunks are uncompressedsequentially one after the other, wherein the operators are applied onthe already decompressed parts.
 9. The method of claim 7 wherein thecompression scheme is BWT, wherein the decompression process applymultiple threads for analyzing multiple index rows of the BWT resultstring, enabling to provide partial decompression results through thecompression process.
 10. A system for executing an SQL operator oncompressed data chunk using at least one HWA, said system comprised of:a database of clustered compressed data chunks including compressionscheme; at least one HWA unit; at least one CPU unit; SQL operatorsmodule for receiving SQL operator, accessing compressed data chunkblocks, receiving full set of derivatives of the compression scheme,check compression rules based on the compression scheme and relevantoperator for approving SQL operation on compressed data, wherein thecompression scheme is at least one of FOR scheme or BWT scheme; and incase of approval applying respective SQL operator utilizing multiplethreads of HWA unit on relevant compressed data chunks.
 11. The systemof claim 10 wherein the SQL operator is approved incase eachuncompressed data unit is an algebraic transformation of according tosingle offset value.
 12. The system of claim 11 wherein the operator issort operation and the sort operation is applied directly on thecompressed data.
 13. The system of claim 11 wherein the operator ismerge operation, wherein before merge operation is applied, is preformedoffset alignment of all compressed chunks utilizing multiple threads ofat least one HWA unit based on vector/super-scalar architecture,applying the merge on the aligned compressed chunk.
 14. The system ofclaim 11 wherein the operator is join operation, wherein before joinoperation is applied, is performed an algebraic transformation byrecalculating offset values of the relevant data chunks utilizingmultiple threads of at least one HWA unit based on vector/super-scalararchitecture, applying the join operation on the transformed data of therelevant data chunk.
 15. The system of claim 11 wherein the operator isreduce operation, wherein the reduce operator is approved if reduceoperator is obeying the commutative property, applying the reduceoperation on the transformed data of the relevant data chunk.
 16. Thesystem of claim 10 wherein the operator is hash function, wherein thereduce operator is approved if hash function is injective not only fororiginal values but also for result values, applying hash function oncompressed data for mapping compressed data unit through using hashfunction.
 17. The system of claim 10 wherein the SQL operators areapplied at least partly while the data is uncompressed, wherein the partof the data chunks are uncompressed sequentially one after the other,wherein the operators are applied on the already decompressed parts. 18.The system of claim 17 wherein the compression scheme is BWT, whereinthe decompression process apply multiple threads for analyzing multipleindex rows of the BWT result string, enabling to provide partialdecompression results through the compression process.